Ólavur Mortensen | RARE Technologies

New Gensim feature: Author-topic modeling. LDA with metadata.

Ólavur Mortensen 2017-01-18 gensim

The author-topic model is an extension of Latent Dirichlet Allocation that allows data scientists to build topic representations of attached author labels. These author labels can represent any kind of discrete metadata attached to documents, for example, tags on posts on the web. In December of 2016, I wrote a blog post explaining that a Gensim implementation was on its …

Topic Modelling with Latent Dirichlet Allocation: How to pre-process data and tune your model. New tutorial.

Ólavur Mortensen 2016-12-18 gensim, Machine Learning, Open Source, programming, Student Incubator

If you’ve learned how to train topic models in Gensim, but aren’t able to get satisfying results, then we have a new tutorial that will help you get on the right track on GitHub. Primarily, you will learn some things about pre-processing text data for the LDA model. You will also get some tips about how to set the parameters …

Author-topic models: why I am working on a new implementation

Ólavur Mortensen 2016-12-06 gensim, Machine Learning, Open Source, programming, Student Incubator

Author-topic models promise to give data scientists a tool to simultaneously gain insight about authorship and content in terms of latent topics. The model is closely related to Latent Dirichlet Allocation (LDA). Basically, each author can be associated with multiple documents, and each document can be attributed to multiple authors. The model learns topic representations for each author, so that …

Text Summarization with Gensim

Ólavur Mortensen 2015-08-24 programming Leave a Comment

Text summarization is one of the newest and most exciting fields in NLP, allowing for developers to quickly find meaning and extract key words and phrases from documents. RaRe Technologies’ newest intern, Ólavur Mortensen, walks the user through text summarization features in Gensim.